Multiple spaced seeds for homology search
نویسندگان
چکیده
MOTIVATION Homology search finds similar segments between two biological sequences, such as DNA or protein sequences. The introduction of optimal spaced seeds in PatternHunter has increased both the sensitivity and the speed of homology search, and it has been adopted by many alignment programs such as BLAST. With the further improvement provided by multiple spaced seeds in PatternHunterII, Smith-Waterman sensitivity is approached at BLASTn speed. However, computing optimal multiple spaced seeds was proved to be NP-hard and current heuristic algorithms are all very slow (exponential). RESULTS We give a simple algorithm which computes good multiple seeds in polynomial time. Due to a completely different approach, the difference with respect to the previous methods is dramatic. The multiple spaced seed of PatternHunterII, with 16 weight 11 seeds, was computed in 12 days. It takes us 17 s to find a better one. Our approach changes the way of looking at multiple spaced seeds.
منابع مشابه
Fast Computation of Good Multiple Spaced Seeds
Homology search finds similar segments between two biological sequences, such as DNA or protein sequences. A significant fraction of computing power in the world is dedicated to performing such tasks. The introduction of optimal spaced seeds by Ma et al. has increased both the sensitivity and the speed of homology search and it has been adopted by many alignment programs such as BLAST. With the...
متن کاملSensitivity analysis and efficient method for identifying optimal spaced seeds
The novel introduction of spaced seed idea in the filtration stage of sequence comparison by Ma et al. (Bioinformatics 18 (2002) 440) has greatly increased the sensitivity of homology search without compromising the speed of search. Finding the optimal spaced seeds is of great importance both theoretically and in designing better search tool for sequence comparison. In this paper, we study the ...
متن کاملOptimizing Multiple Spaced Seeds for Homology Search
Optimized spaced seeds improve sensitivity and specificity in local homology search. Several authors have shown that multiple seeds can have better sensitivity and specificity than single seeds. We describe a linear programming (LP)-based algorithm to optimize a set of seeds. Theoretically, our algorithm offers a performance guarantee: the sensitivity of a chosen seed set is at least 70% of wha...
متن کاملOn the complexity of the spaced seeds
Optimal spaced seeds were introduced by the theoretical computer science community to bioinformatics to effectively increase homology search sensitivity. These seeds are serving many homology queries daily. However the computational complexity of finding the optimal spaced seeds remains to be open. In this paper, we prove that computing hit probability of a spaced seed in a uniform homology reg...
متن کاملSeed-Set Construction by Equi-entropy Partitioning for Efficient and Sensitive Short-Read Mapping
Spaced seeds have been shown to be superior to continuous seeds for efficient and sensitive homology search based on the seedand-extend paradigm. Much the same is true in genome mapping of high-throughput short-read data. However, a highly sensitive search with multiple spaced patterns often requires the use of a great amount of index data. We propose a novel seed-set construction method for ef...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 23 22 شماره
صفحات -
تاریخ انتشار 2007